entire book
I Found an Entire Book That Was Written About … Me. It Only Got Weirder From There.
Have you ever stared in a mirror for a few hours? Try it: Watch as your nose somehow shifts placement on your face, how your eyebrows lose symmetry, how quickly you fail to recognize yourself. Facial dysmorphia would come to anyone tasked with considering their own reflection for too long. It's a similar experience when you promote a book. For the past few weeks, I've been touring Canada and the U.S. promoting my latest book, Sucker Punch.
- North America > Canada > Ontario > Toronto (0.05)
- Asia > Middle East > Qatar (0.05)
- Asia > Middle East > Palestine (0.05)
- Africa > Middle East > Egypt (0.05)
- Information Technology > Communications > Social Media (0.71)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.40)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.40)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)
CLIPPER: Compression enables long-context synthetic data generation
Pham, Chau Minh, Chang, Yapei, Iyyer, Mohit
LLM developers are increasingly reliant on synthetic data, but generating high-quality data for complex long-context reasoning tasks remains challenging. We introduce CLIPPER, a compression-based approach for generating synthetic data tailored to narrative claim verification - a task that requires reasoning over a book to verify a given claim. Instead of generating claims directly from the raw text of the book, which results in artifact-riddled claims, CLIPPER first compresses the book into chapter outlines and book summaries and then uses these intermediate representations to generate complex claims and corresponding chain-of-thoughts. Compared to naive approaches, CLIPPER produces claims that are more valid, grounded, and complex. Using CLIPPER, we construct a dataset of 19K synthetic book claims paired with their source texts and chain-of-thought reasoning, and use it to fine-tune three open-weight models. Our best model achieves breakthrough results on narrative claim verification (from 28% to 76% accuracy on our test set) and sets a new state-of-the-art for sub-10B models on the NoCha leaderboard. Further analysis shows that our models generate more detailed and grounded chain-of-thought reasoning while also improving performance on other narrative understanding tasks (e.g., NarrativeQA).
- North America > Canada > Ontario > Toronto (0.04)
- North America > United States > Minnesota (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- (10 more...)
An Entire Book Was Written in DNA--and You Can Buy It for 60
As the rate of humanity's data creation increases exponentially with the rise of AI, scientists have been interested in DNA as a way to store digital information. After all, DNA is nature's way of storing data. It encodes genetic information and determines the blueprint of every living thing on earth. And DNA is at least 1,000 times more compact than solid-state hard drives. To demonstrate just how compact, researchers have previously encoded all of Shakespeare's 154 sonnets, 52 pages of Mozart's music, and an episode of the Netflix show "Biohackers" into tiny amounts of DNA.
Statistical Thinking for the 21st Century
The goal of this book is to the tell the story of statistics as it is used today by researchers around the world. It's a different story than the one told in most introductory statistics books, which focus on teaching how to use a set of tools to achieve very specific goals. This book focuses on understanding the basic ideas of statistical thinking -- a systematic way of thinking about how we describe the world and use data make decisions and predictions, all in the context of the inherent uncertainty that exists in the real world. It also brings to bear current methods that have only become feasible in light of the amazing increases in computational power that have happened in the last few decades. Analyses that would have taken years in the 1950's can now be completed in a few seconds on a standard laptop computer, and this power unleashes the ability to use computer simulation to ask questions in new and powerful ways.
- Information Technology > Artificial Intelligence (0.71)
- Information Technology > Software > Programming Languages (0.49)
- Information Technology > Communications > Social Media (0.30)